Skip to content

Conversation

@ckoparkar
Copy link
Member

Fixes #161036.

@ckoparkar ckoparkar marked this pull request as ready for review October 2, 2025 14:02
@llvmbot llvmbot added backend:X86 llvm:SelectionDAG SelectionDAGISel as well labels Oct 2, 2025
@ckoparkar
Copy link
Member Author

/cc @RKSimon

@llvmbot
Copy link
Member

llvmbot commented Oct 2, 2025

@llvm/pr-subscribers-backend-aarch64
@llvm/pr-subscribers-llvm-selectiondag

@llvm/pr-subscribers-backend-x86

Author: Chaitanya Koparkar (ckoparkar)

Changes

Fixes #161036.


Full diff: https://github.com/llvm/llvm-project/pull/161651.diff

2 Files Affected:

  • (modified) llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp (+22)
  • (added) llvm/test/CodeGen/X86/underflow-compare-fold.ll (+16)
diff --git a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
index 558c5a0390228..99d7000c3b62e 100644
--- a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
@@ -6199,6 +6199,28 @@ SDValue DAGCombiner::visitIMINMAX(SDNode *N) {
                                         SDLoc(N), VT, N0, N1))
     return SD;
 
+  // (umin (sub a, b) a) -> (usubo a, b); (select usubo.1, a, usubo.0)
+  //
+  // IR:
+  //   %sub  = sub %a, %b
+  //   %cond = umin %sub, %a
+  //   ->
+  //   %usubo    = usubo %a, %b
+  //   %overflow = extractvalue %usubo, 1
+  //   %sub      = extractvalue %usubo, 0
+  //   %cond     = select %overflow, %a, %sub
+  if (N0.getOpcode() == ISD::SUB) {
+    SDValue A, B, C;
+    if (sd_match(N, m_UMin(m_Sub(m_Value(A), m_Value(B)), m_Value(C)))) {
+      EVT AVT = A.getValueType();
+      if (A == C && TLI.isOperationLegalOrCustom(ISD::USUBO, AVT)) {
+        SDVTList VTs = DAG.getVTList(AVT, MVT::i1);
+        SDValue USO = DAG.getNode(ISD::USUBO, DL, VTs, A, B);
+        return DAG.getSelect(DL, VT, USO.getValue(1), A, USO.getValue(0));
+      }
+    }
+  }
+
   // Simplify the operands using demanded-bits information.
   if (SimplifyDemandedBits(SDValue(N, 0)))
     return SDValue(N, 0);
diff --git a/llvm/test/CodeGen/X86/underflow-compare-fold.ll b/llvm/test/CodeGen/X86/underflow-compare-fold.ll
new file mode 100644
index 0000000000000..2416bcb909485
--- /dev/null
+++ b/llvm/test/CodeGen/X86/underflow-compare-fold.ll
@@ -0,0 +1,16 @@
+; RUN: llc < %s -mtriple=x86_64 | FileCheck %s
+
+; GitHub issue #161036
+
+define i64 @subIfNoUnderflow_umin(i64 %a, i64 %b) {
+; CHECK-LABEL: subIfNoUnderflow_umin
+; CHECK-LABEL: %bb.0
+; CHECK-NEXT: movq    %rdi, %rax
+; CHECK-NEXT: subq    %rsi, %rax
+; CHECK-NEXT: cmovbq  %rdi, %rax
+; CHECK-NEXT: retq
+entry:
+  %sub = sub i64 %a, %b
+  %cond = tail call i64 @llvm.umin.i64(i64 %sub, i64 %a)
+  ret i64 %cond
+}

Copy link
Collaborator

@RKSimon RKSimon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add aarch64 test coverage

@ckoparkar
Copy link
Member Author

@RKSimon this is ready for another review.

%sub = sub i64 %a, %b
%cond = tail call i64 @llvm.umin.i64(i64 %sub, i64 %a)
ret i64 %cond
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test same pattern with vectors. Also add multiple use test, and illegal type

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added some tests with vector types, incorrect patterns, etc.

I didn't guard against multiple uses here since I thought the results of sub and umin are still available after the fold. Am I misunderstanding something here and should I be guarding the pattern with m_OneUse?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The point isn't whether it's correct or not, but whether it's worthwhile. If you aren't eliminating the instruction, is it worth doing the fold (I suppose maybe if the min will be expanded)

@ckoparkar
Copy link
Member Author

Apologies, I was traveling for a few days. I'll update this today.

@ckoparkar ckoparkar marked this pull request as draft October 22, 2025 12:39
Copy link
Collaborator

@RKSimon RKSimon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please can you update against trunk latest and investigate the CI failure

@ckoparkar
Copy link
Member Author

  1. Test failure

Previously the code was:

if (N0.getOpcode() == ISD::SUB) {
  sd_match(N, m_UMin(m_Sub(m_Value(A), m_Value(B)), m_Deferred(A)))
  ...

Once I removed the if (N0.getOpcode() == ISD::SUB), this pattern match grabbed this:

  SelectionDAG has 17 nodes:
  t2: i32,ch = CopyFromReg t0, Register:i32 %0
                t8: i32 = sub t2, Constant:i32<4>
              t9: i32 = umin t2, t8


N: t9: i32 = umin t2, t8
A: t2: i32,ch = CopyFromReg t0, Register:i32 %0
B: t7: i32 = Constant<4>

t8: i32 = sub.. is the second argument of umin so I'm kind of confused as to why this pattern match succeeded. I'm now pattern matching on individual arguments of umin.

  1. More tests

I've added some tests with vector types, incorrect patterns, etc. @arsenm also mentioned a test for multiple use.

I didn't guard against multiple uses here since I thought the results of sub and umin are still available after the fold. Am I misunderstanding something here and should I be guarding the pattern with m_OneUse?

* main: (3124 commits)
  [X86] narrowBitOpRMW - add handling for single bit insertion patterns (llvm#165742)
  [gn build] Port 5322fb6
  [libc++] Simplify the implementation of destroy_at a bit (llvm#165392)
  [MLIR][NVVM] Update mbarrier.init/inval Ops to use AnyTypeOf[] (llvm#165558)
  [X86] Remove AMX-TRANSPOSE (llvm#165556)
  [CIR] Fix multiple returns in switch statements (llvm#164468)
  [lld][test] Fix file cleanup in aarch64-build-attributes.s (llvm#164396)
  [X86] combineTruncate - trunc(srl(load(p),amt)) -> load(p+amt/8) - ensure there isn't an interdependency between the load and amt (llvm#165850)
  [llvm][docs] Remove guidance on adding release:reviewed label (llvm#164395)
  [libc++] Update our documentation on the supported compilers (llvm#165684)
  [AMDGPU][GlobalISel] Add register bank legalization for G_FADD (llvm#163407)
  [LLVM][ConstantFolding] Extend constantFoldVectorReduce to include scalable vectors. (llvm#165437)
  [SDAG] Set InBounds when when computing offsets into memory objects (llvm#165425)
  [llvm][DebugInfo][ObjC] Make sure we link backing ivars to their DW_TAG_APPLE_property (llvm#165409)
  [NFCI] Address post-merge review of llvm#162503 (llvm#165582)
  [clang][tools][scan-view] Remove Python2 compatibility code in ScanView.py (llvm#163747)
  [lldb][TypeSystem] Remove count parameter from TypeSystem::IsFloatingPointType (llvm#165707)
  [llvm][tools][opt-viewer] Put back missing function
  [llvm][tools][opt-viewer] Remove Python2 compatability code in optrecord.py (llvm#163744)
  [clang][utils] Make CmpDriver Python3 compatible (llvm#163740)
  ...
* main:
  [SPIRV] Fix vector bitcast check in LegalizePointerCast (llvm#164997)
  [lldb][docs] Add troubleshooting section to scripting introduction
  [Sema] Fix parameter index checks on explicit object member functions (llvm#165586)
  To fix polymorphic pointer assignment in FORALL when LHS is unlimited polymorphic and RHS is intrinsic type target (llvm#164999)
  [CostModel][AArch64] Model cost of extract.last.active intrinsic (clastb) (llvm#165739)
  [MemProf] Select largest of matching contexts from profile (llvm#165338)
  [lldb][TypeSystem] Better support for _BitInt types (llvm#165689)
  [NVPTX] Move TMA G2S lowering to Tablegen (llvm#165710)
  [MLIR][NVVM] Extend NVVM mma ops to support fp64 (llvm#165380)
  [UTC] Support to test annotated IR (llvm#165419)
@ckoparkar ckoparkar marked this pull request as ready for review October 31, 2025 14:28
@ckoparkar ckoparkar requested a review from RKSimon October 31, 2025 14:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[x86][armv8] Failure to use flag in unsigned underflow compare idiom

4 participants